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1.  INTRODUCTION 


One  of  the  fundamental  theorems  of  information  theory  is  Shannon’s  coding  theorem 
(Shannon  and  Weaver  1949)  for  noisy  channels.  Using  random  coding  arguments,  Shannon 
discovered  the  original  version,  and  later,  Fano  (1961)  stated  it  in  a  stronger  version: 


For  any  stationary  channel  with  finite  memory,  a  channel  capacity 
C  can  be  defined  having  the  following  significance.  For  any  binary 
transmission  rate  R  smaller  than  C,  the  probability  of  error  per  digit 
can  be  made  arbitrarily  small  by  properly  designing  the  channel  encoder 
and  decoder.  Conversely,  the  prob^ilify  of  error  cannot  be  made 
arbitrarily  small  when  R  is  greater  than  C. 


The  average  error  probability  of  the  best  block  codes  on  the  noisy  channel  can  be  bounded 
as  follows: 


where  n  is  the  length  of  a  code  word  and  E/R),  the  random  coding  exponent,  is  positive  for  all 
rates  R  less  than  capacity  C.  The  existence  of  such  exponential  error  bounds  indicates  a  useful 
communications  channel.  Gallager  (1965)  pioneered  a  very  elegant  derivation  of  this  random 
coding  exponent,  using  a  novel  upper  bound  to  ttie  error  probability.  In  another  paper,  Forney 
(1968)  generalized  Gallager's  exponential  error  bounds  for  generalized  decoding  schemes, 
namely  decoding  with  erasure,  list  decoding,  and  decision  feedback  schemes.  Much  of  this  work 
uses  random  coding  arguments  in  which  each  input  message  is  represented  by  a  code  word 
constructed  by  selecting  n  symbols  from  an  alphabet  of  independent,  identically  distributed 
symbols.  The  error  probability  of  the  channel  and  coding  scheme  is  averaged  over  the  ensemble 
of  all  randomly  chosen  codes,  and  there  must  be  at  least  one  nonrandom  code  with  error 
probability  as  small  as  the  ensemble  average. 

In  all  of  the  previously  mentioned  papers  and  in  most  literature  on  information  theory,  the 
assumption  is  that  the  statistical  model  of  the  noisy  channel,  expressed  by  the  probability 
transition  matrix,  is  completely  known  (i.e.,  that  the  channel  is  statistically  describable).  The 
capacities  of  channels  which  are  not  so  describable  have  been  investigated  by  Blackwell, 


1 


Brieman,  and  Thomasian  (1960),  Stiglitz  (1967),  and  many  others.  A  channel  for  which  the 
transition  matrix  can  change  with  each  use  is  often  known  as  an  arbitrarily-varying  channel 
(Blackwell,  Breiman,  and  Thomasian  1960).  Of  potentially  practical  interest  is  the  channel,  also 
not  statistically  describable,  in  which  the  probability  transition  matrix  remains  fixed  over  one  code 
word.  This  is  the  so-called  "fixed  unknown"  channel  (Blackwell,  Breiman,  and  Thomasian  1960), 
now  called  the  "compound"  channel  (Gallager  1965). 

By  another  approach,  followed  by  Kazakos  (1981)  and  elsewhere,  various  authors  analyze 
the  performance  of  transmission  through  noisy  channels  when  an  inaccurate  version  of  the 
probability  transmission  matrix  is  used  by  the  decoder.  This  is  termed  "mismatch."  Kazakos 
(1981)  derived  upper  and  lower  bounds  for  transmission  through  channels  in  the  presence  of 
mismatch.  Kazakos  (1981)  also  found  the  necessary  and  sufficient  conditions  for  the  error 
probability  of  a  random  code  to  converge  to  zero  with  increasing  block  length.  They  were 
expressed  in  terms  of  distances  between  the  actual  and  assumed  channel  probability  transition 
matrices.  In  the  present  paper,  we  obtain  exponential  error  bounds  for  generalized  decoding 
schemes  of  the  type  considered  by  Forney  (1968)  but  for  the  case  of  mismatch. 

2.  GENERALIZED  DECODING 

We  consider  a  noisy,  discrete  channel  chosen  to  be  memoryless  for  the  present  report. 
Generalizations  will  follow  in  subsequent  work.  Let 

P  =  =  b\x^  =  ol)  =  p^^;a  =  ^  =  1 . 8} 

6-1 


be  the  transition  probability  matrix  of  the  noisy  channel.  Let  us  consider  a  block  code  of  block 
size  n,  with  the  following  code  words: 


. X4;/W=e'>«;X,  =  (x,, . , 
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where  R  =  rate  of  the  code.  The  probability  of  receiving  a  block  y  =  (y,,...,yj  when  was 
transmitted  is  as  follows: 


P{y\XJ  =  nP{yj\xJ  . 

/-I 


(1) 


We  will  assume  that  all  entries  of  P  are  positive,  i.e., 

Po  =  m'np^>0.  (2) 

a,b 


Ordinarily,  maximum  likelihood  decoding  selects  the  message  m  that  maximizes  the  likelihood 
P(y\x^.  We  assume  that  the  prior  probabilities  of  the  M  code  words  are  equal:  tc,  =  M'\ 
Maximum  likelihood  decoding  divides  the  decision  space  S  into  disjoint  regions  {Rp...,R^)  by  the 
following: 


ye  RJffP{y\X„)  >  P{y\X^)  for  all  v*m.  (3) 

In  the  present  report,  we  will  assume  that  an  inaccurate  version  Q  of  the  true  transition 
probability  matrix  P  is  used  in  decoding.  Let 

Q  «  { J(y,  =  b|x,  =  a)  =  b  =  1 . B}  (4) 

be  the  entries  of  the  r.  ninai  probability  transition  matrix  used  in  decoding.  Naturally, 
for  at  least  one  pair  of  entries. 

For  maximum  likelihood  decod.,  .y  under  mismatch,  the  decision  space  S  is  separated  into  a 

M  ^ 

different  set  of  di^ini  .egions  { ft such  that  \J  ft/  =  S  and  ftjf^  /?y  =  0  .  These 

i•^ 

regions  are  defined  by  the  following: 
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ye  R^/ffO(ylX„)  >  0(y|XJ  forallv^m. 


(5) 


(As  indicated  earlier,  this  is  one  form  of  the  fixed  unknown  channel  [Blackwell,  Breiman,  and 
Thomasian  I960].) 

Two  generalized  decision  rules  were  considered  by  Forney  (1968).  The  first  one  is  the 
inclusion  of  an  erasure  option.  In  this  case,  an  additional  region  Rg  is  included  to  represent  the 
event  that  no  transmitted  message  is  to  be  assigned  to  the  received  symbol  because  the  value 

of  the  latter  cannot  be  known  reliably;  if  ye  Rg,  we  declare  an  erasure.  Thus,  M  +  1  outcomes 
are  possible.  The  M  +  1  regions  {Rg,  . Rm)  are  disjoint  and  cover  all  the  space  S: 

M 

U  fl,  =  S.Rif\  R.  =  0,/,y  =  0,1 . M  . 

y-0 


Let  ¥-2  ^  ^  undetected  error;  this  is  the  event  that  ye  R^  and  that  some  code  word 

m  was  actually  transmitted.  That  is,  the  decoder  believes  that  it  has  correctly  decoded 
because  it  has  produced  a  code  word  according  to  the  decoding  algorithm.  However,  the  code 
word  thus  produced  is  net  the  code  word  that  was  transmitted.  The  probability  of  can  be 
expressed  as  follows: 


P[£,]=  E  E  £  P(y|X,)P(X,)  .  (6) 

m-  1  ye  k*m 


Let  E,  be  the  event  in  which  the  received  word  y  does  not  fall  in  the  decision  region  R^ 
corresponding  to  the  transmitted  code  word  X„]  the  probability  of  E,  is  as  follows: 

P[£,]=  E  E  P(y|X„)P(X„)  .  (7) 

m-1  y«fl. 
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If  E,  occurs,  either  an  undetected  error  or  an  erasure  must  ensue;  hence,  the  probability  of  an 
erasure  is  as  follows: 


P[e]  =  P[E,]-  P[E,]>0  . 


The  problem  in  choosing  the  regions  [Rj,...,Ri^}  is  now  formulated.  We  wish  to  minimize  P[E,] 
for  a  given  P[EJ  or  vice  versa.  It  is  clear  that  increasing  R„  will  increase  P[Ej  but  decrease 
P[E,];  hence,  we  have  a  variation  of  the  Neyman-Pearson  problem  (van  Trees  1968). 

The  second  type  of  generalized  decodng  is  list  decoding.  Here,  the  decision  regions  {P,,. . 
overlap;  hence,  for  each  received  word  y.  a  list  of  code  words  is  produced.  The  list  contains  at 
least  one  code  word;  the  size  of  the  list  varies,  as  will  be  explained.  The  performance  of  list 
decoding  is  evaluated  through  two  event  probabilities.  A  list  error  is  the  event  in  which  the 
transmitted  code  word  is  not  on  the  list  or,  equivalentiy,  in  which  the  received  word  y  is  not  in  the 
decision  region  R„  corresponding  to  the  transmitted  code  word  This  is  the  event  E,,  with 
probability  given  by  Equation  7.  The  second  probability,  that  some  code  word  X„  will  be  on  the 
list  although  some  other  code  word  X^  k*  m  was  sent,  is  as  follows: 

P(X„  on  list  and  incorrect)  *  E  E  P(  y  |  )  P{  )  . 

y€R^k*m 

The  average  number  L  of  incorrect  code  words  on  the  list  is  as  follows: 

—  W  M 

L  =  E  P(X„  on  list  and  incorrect)  =  E  E  E  P{  /  |  X^ )  P  (  X^ )  .  (8) 

m«1  m«1  ye  R„  k*m 

We  observe  that  the  expression  (Equation  8)  forT  is  identical  to  the  expression  (Equation  6)  for 
P{E^,  where  P(E^  is  no  longer  a  probability  but  represents  T.  In  the  sequel,  we  will  use  P[Ej 
to  denote  both  cases. 
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Thus,  we  ha.d  a  unified  formulation  of  decoding  with  erasure  and  list  decoding.  Forney 
(1968)  proved  that  the  optimum  regions  ««}  found  under  the  criterion  of  minimizing 
Equation  6  under  constant  of  Equation  7  or  vice  versa  are  as  follows: 


E  P(y\K)P(K) 

k*m 


‘P{y\XJP{XJ>e 


(9) 


where  n  «  block  length  and  T  =  an  arbitrary  parameter. 

An  equivalent  way  of  describing  the  decision  regions  (Equation  9)  is  through  the  posterior 
probabilities  P(X„\y). 


yeRJffP{X„\y)>u, 


(10) 


where  u  = 

In  ordinary  decoding,  we  decode  into  the  code  word  X„  for  which  P(X„  I  y)  is  greatest.  With 
the  erasure  option,  we  guess  the  code  word  X„  for  which  P(X„  \  y)  is  greatest,  so  long  as 
P(X^\y)  i  u,  1/2.  This  corresponds  to  T^O.  With  list  decoding,  to  minimize  the  average 
list  size  for  a  given  list  error  probability,  we  put  on  the  list  ail  code  words  for  which  P(X^  \y)^  u, 
u  S  1/2.  This  corresponds  to  7  <  0. 

Thus,  the  regions  R„  defined  by  Equation  4  are  optimal  for  regular  decoding  {T  -  0),  list 
decoding,  {T<0 and  overlapping),  and  decoding  with  erasure  option  {T>  0).  Note  that  in  order 
to  define  the  decision  regions  we  need  to  know  the  probability  transition  matrix  P.  In  the 
mismatch  situation,  we  utilize  O  instead  of  P.  We  assume,  for  simplicity,  from  this  point  on,  equal 
prior  probabilities:  P(X^  =  hence,  the  decision  regions  (under  mismatch)  are  as  follows: 
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(11) 


^  Q{y\x,) 

k*m 


Q[y\xj>e 


where  R„  denotes  the  decision  region  based  on  mismatch.  For  O  =  P,  we  have  =  R^ 


3.  BOUNDS  UNDER  MISMATCH 


We  will  now  generalize  Forney’s  upper  bounds  for  the  mismatched  case.  We  have  the 
following  two  probabilities  to  upperbound; 

P[£,]  =  M-’  E  E  P(y|X„),  (12) 

1  y€ 

P\EA  =  E  E  E  P{y\X,)  .  (13) 

/n«  1  yg 


Let  us  define  the  following  functions: 

S„4S„(y)AEP{y|X.), 

-  k*m 

Z„AZ„(y)A  E  Q(y|X.)  , 

~  “  k*m 

Q^Q{y\XJ  , 


and 


p.^P(y\x„). 


Note  that 


ES„(y)  =  EZ„(y)  =  M-1  .  (14) 

y  y 
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Hence,  if  we  divide  S„(y)  and  ZJy)  by  M-1,  they  become  probability  distributions:  more 
specifically,  they  become  mixtures  of  M-1  distributions,  with  equal  mixing  parameters. 

If  we  define  the  indicator  function  as  follows: 


1  for  ye  for  Z''  , 

0  otherwise, 


(15) 


then,  the  expressions  for  P[E,],  F[E^,  can  be  written  as  follows: 


M 


V,4P[E,1  =  W-'£E[1  -  , 


(16) 


M 


V,A(M-  1)-'P(e,I  =  M-'  £  Z<i>Jy)-SJy){M  -  .  (17) 

“  m-1  y 


Note  that 

ZSJy)-{M  -  =  1  and  S„(y)(M  -  1 )-'  >  0  . 

y 

Hence,  the  normalized  S„  behaves  like  a  probability  distribution  function.  We  observe  that 


M 


1  -  V,  =  E  Eo„(y)P(y 

m-1  y 

\XJ  ,  . 

(18) 

1  -  V,  =  M-’  E  E[1  -d.„(y)](M- 

m-1  y  *■  •' 

(19) 

For  s  >  0,  we  have  the  following  bounds: 

<6„(y)s(z;’0„e-"'-)’, 

(20) 
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(21) 


1  -<I.„{y)£(o;'Z„e"’-)‘. 

Using  the  bounds  of  Equations  20  and  21  in  Equations  16, 17, 18,  and  19,  we  obtain  the  following 


inequalities: 

E  s:{o;’z„e”'-)V„ ,  (22) 

m- 1  y 

^  -  V,£M-'  E  E(z„'Q„e-"")’p„  ,  (23) 

m-1  y 

E  E(z;'Q„e-"'-)‘s„-(M-  1)-'  ,  (24) 

m-l  y 

1  -  E  E(z„0;,’e"’-)'(M-  .  (25) 

/n«1  y 

We  define  the  following  functions: 

S,„(s)  =  E[(M-1)-’-Z„-Q;’fp„,  (26) 

y 

SkJs)  =  E[(Af-  1)Z„’Q„]‘(M-  1)-’-S„.  (27) 

y 

We  then  obtain  the  following  inequalities: 

1)‘ E  g,„(s)  ,  (28) 

m>  1 

1  -  1)-*  E  g,J-s}  .  (29) 

7T>»  1 
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(30) 


/n»1 

1  -  1)*  E  gj„(-s)  .  (31) 

We  will  utilize  a  form  of  Jensen's  inequality  (Korevaar  1968),  which  states  the  following: 

<  Ea, ,  q>A  ,a  >0  . 

.  y  J  y 

If  we  introduce  a  new  parameter  s.  we  have,  by  Jensen’s  inequality,  the  following: 


g,„(s)  =  e([(M-  1)-'  E  oT'I'or-P^ 

y  U  k*m  J 

£EP„CJr|E[(/M-  Ij-’Q.pr 
sEp:-‘[p„q;']'  |£[(m  -  1  )■’  c.r}"  •  (32) 


Let  also  q>  1  ■  s.  By  the  same  argument, 


g,„(s)  =  e[(M-  1)Z;'0„]'(M-  1)-'S„ 

y 

=  e[s„z;’]'o*[(m- 1)-’S„]’" 

=  e[s„z;’]'q’|[(m- 
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We  consider  probability  transition  matrices  for  which  a  finite  lower  bound  exists  to  every  entry. 


Then,  there  exists  a  finite  number  B  that  is  an  upper  bound  to 
obtain  the  following  upper  bounds: 


P  Qm 


We 


g,Js)<B-EP:-‘!  eI(M- 

y  k*m 

E  [(M- 1 


Multiplying  by  (M  -  If,  we  find  the  following: 


(M-  1)‘g,„(s)<SEP:-‘ 

y 


E  Qt" 


k*m 


(M-  1)  -(M  -  1)-g,„(s)SS-EQ* 

/ 


k*m 


(34) 


(35) 


(36) 

(37) 


Note  from  Equations  26  and  28  that  the  products  (M  -  If  g^Jis),  {M  -  7)'®  g2m{s)  appear  at  the 
upper  bounds;  hence,  we  are  interested  in  bounding  them  directly. 

At  this  point,  we  need  to  resort  to  random  coding  arguments.  As  is  customary,  and  following 
Gallager’s  (1965)  and  Forney's  (1968)  approaches,  we  choose  a  code  at  random  by  choosing 
each  input  letter  of  each  code  word  by  a  random  selection  in  which  the  probability  of  choosing 
input  is  Pf^  Denoting  the  average  by  an  overbar,  a  modification  of  the  approach  in  Forney’s 
paper  (Forney  1968)  yields  the  following: 


E|Ep.p;V*)|Ep,<7;;'')' 


(38) 


where 


P  =  {pj,}  ,  Q  =  {qjj 
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are  the  true  and  assumed  channel  probability  transition  matrices,  respectively.  This  is  in 
agreement  with  Forney’s  bound  W  P  =  Q  (matched  case)  and,  hence,  S  =  1. 

A  similar  upper  bound  is  produced  for  |  Qzmis) } : 


(M  -  1  )(M  -  1  )-‘ft„{s)SB-exp((jnfl) 


(39) 


where 


q>^  -s,  p>s>Q,S<  1 


and  R  =  code  rate;  R  -  n*'  log  M. 


The  upper  bounds  for  V,,  V^,  averaged  over  the  random  choice  of  code  words,  are  as  follows: 


E|Ep.p;-»J  |Ep,p;"’ 


(40) 


(/W  -  1 ) 


E 

/ 


^PkP)k 


,(1  -s)/q 


^  Pv  Pjv 


(41) 


Note  that  the  previous  bounds  (Equations  40  and  41)  will  converge  exponentially  to  zero  if  the 
functions 


f,(s)  =  E(Ep.P/l“)  ■  1  apasao 


and 


Us)  =  E 


I  L  * 


Ep^pyV'*’'*'  |Ep,qytJ  ,  q>1  -s,s>0 


(42) 


(43) 
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are  both  less  than  1  for  some  (s,  p,  q)  =  (s®,  p®,  <f).  This  is  a  sufficient  condition  for  both 
quantities  V^,  to  converge  to  zero  for  some  random  block  code  of  size  n,  as  r7->«.  We  observe 
that  for  any  p,  f,(P)  =  1  and  for  q=1.  =  1. 

A  sufficient  condition  that  both  Vj  and  converge  exponentially  to  zero  in  spite  of  mismatch 
is  that  for  a  pair  {P,  Q,  we  have  the  following: 


min  f^{s)  •(  Ts  -  pR)<^  , 

(44) 

min  4(s)  •{  qR  -  Ts)  <  1  , 

(45) 

where  the  min  are  over  all  three  parameters  (p.  q,  s). 

Due  to  the  stated  properties  of  fj(s).  it  is  always  guaranteed  that  for  P,  Q  sufficiently 
close,  the  bounds  of  Equations  40  and  41  will  converge  exponentially  to  zero  as  n  becomes 
infinite. 


13 


Intentionally  left  blank. 
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